The MDL model choice for linear regression
نویسنده
چکیده
In this talk, we discuss the principle of Minimum Description Length (MDL) for problems of statistical modeling. By viewing models as a means of providing statistical descriptions of observed data, the comparison between competing models is based on the stochastic complexity (SC) of each description. The Normalized Maximum Likelihood (NML) form of the SC (Rissanen 1996) contains a component that may be interpreted as the parametric complexity of the model class. Once the SC for the data, relative to a class of suggested models, is calculated, it serves as a criterion for selecting the optimal model with the smallest SC. This is the MDL principle (Rissanen 1978, 1983) for model choice. If the parametric complexity of a model family is unbounded, then one must deviate from the clean definition of the SC. The most important example of this phenomenon is the Gaussian family. One approach to bound the parametric complexity is by constraining the sample space. We calculate the SC for the Gaussian linear regression by using the NML density and consider it as a criterion for model selection. The final form of the selection criterion depends on the method for bounding the parametric complexity. As opposed to traditional fixed penalty criteria, this technique yields adaptive criteria that have demonstrated success in certain applications.
منابع مشابه
Minimum Description Length Model Selection Criteria for Generalized Linear Models
This paper derives several model selection criteria for generalized linear models (GLMs) following the principle of Minimum Description Length (MDL). We focus our attention on the mixture form of MDL. Normal or normal-inverse gamma distributions are used to construct the mixtures, depending on whether or not we choose to account for possible over-dispersion in the data. For the latter, we use E...
متن کاملComputing Minimum Description Length for Robust Linear Regression Model Selection
A minimum description length (MDL) and stochastic complexity approach for model selection in robust linear regression is studied in this paper. Computational aspects and implementation of this approach to practical problems are the focuses of the study. Particularly, we provide both algorithms and a package of S language programs for computing the stochastic complexity and proceeding with the a...
متن کاملThe Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models
In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...
متن کاملModel Selection and the Principle of Minimum Description Length
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov’s theory of algorithmic complexity, matured in the literature on info...
متن کاملExact Minimax Predictive Density Estimation and MDL
The problems of predictive density estimation with Kullback-Leibler loss, optimal universal data compression for MDL model selection, and the choice of priors for Bayes factors in model selection are interrelated. Research in recent years has identified procedures which are minimax for risk in predictive density estimation and for redundancy in universal data compression. Here, after reviewing ...
متن کامل